home *** CD-ROM | disk | FTP | other *** search
- Path: engnews1.Eng.Sun.COM!taumet!clamage
- From: kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763)
- Newsgroups: comp.std.c++
- Subject: Re: Guarantees concerning references/pointers into string
- Date: 31 Jan 1996 17:39:43 GMT
- Organization: SEL
- Sender: news@lts.sel.alcatel.de
- Approved: clamage@eng.sun.com (comp.std.c++)
- Message-ID: <KANZE.96Jan31183659@slsvewt.lts.sel.alcatel.de>
- References: <KANZE.96Jan25184235@gabi.gabi-soft.fr> <310BF7E7.4DDE@suphys.physics.su.oz.au>
- NNTP-Posting-Host: taumet.eng.sun.com
- Content-Type: text
- In-Reply-To: John Max Skaller's message of 28 Jan 1996 23:24:34 GMT
- Apparently-To: std-c++@ncar.ucar.edu
- Content-Length: 10298
- X-Lines: 256
- Originator: clamage@taumet
-
- In article <310BF7E7.4DDE@suphys.physics.su.oz.au> John Max Skaller
- <maxtal@suphys.physics.su.oz.au> writes:
-
- |> J. Kanze wrote:
-
- |> > a = "A lot of junk" ;
- |> > a.reserve( 100 ) ;
- |> > char* p1 = &a[ 1 ] ;
- |> > // b = a ;
- |> > a.insert( 6 , "---" ) ;
- |> > char* p2 = &a[ 1 ] ;
- |> > cout << "Reserve test: " << (p1 == p2 ? "Passed" : "Failed") << endl ;
- |> >
- |> > As I interpret the standard, the above is required to work.
-
- |> I agree. It should work.
-
-
- |> > The presense of
- |> > `reserve' makes copy on write anything but trivial to implement
- |> > correctly. (The exact case above is actually easy to avoid, and I'm
- |> > surprised that g++ has the error. But try uncommenting the assignment
- |> > in the above code, and it becomes considerably more difficult to avoid
- |> > the problem.)
-
- |> I do NOT understand. There is a problem with
- |> copy on write implementations, but it has nothing to do with
- |> reserve(). Rather, the problem occurs because of aliasing:
-
- It involves reserve, because reserve implicitly guarantees that your
- pointers remain value despite following write operations. At least,
- that is my somewhat shaky interpretation of the current draft, and a
- major part of my question was: is this interpretation valid?
-
- Without the reserve, I do *not* expect the above to work.
-
- |> void f(string &x, string const &y) {
- |> char &ch = y[0];
- |> x = "fred";
- |> }
-
- |> Is "ch" assured? Nope:
-
- |> string s = "Hi";
- |> f(s,s);
-
-
- |> Consider now:
-
- |> void f(string &x, string const &y) {
- |> char &ch = y[0];
- |> cout << x[0] << ch;
- |> }
-
- |> Here, ch might _still_ be invalidated even though there
- |> is no write operation -- merely binding a non-const
- |> lvalue into the string _potentially_ allows writing,
- |> and the string class has no choice but to do the copy
- |> immediately. C++ overloads on the constness of the object
- |> and not whether the context is an lvalue or rvalue context.
-
- Agreed. This was the whole point of my question.
-
- As I understand the current draft (and I'm not sure about my
- interpretation), pointers and references into a string are valid until
- `reallocation occurs'. The draft makes very few statements with
- regards to this; I see nothing off hand that would prevent
- reallocation even when calling a const function (e.g. c_str, for
- example).
-
- The draft does guarantee (or I think it does) that after a call to
- reserve, *no* reallocation will take place until the string becomes
- bigger than the reserved size. (The actual wording refers to
- insertion making the string bigger. This seems very vague to me.) In
- practical terms, this means that 1) the copy must be made unique in
- reserve, if it isn't already, and 2) no other string object may be
- made to refer to this copy, since that could trigger a reallocation on
- writing. Implementing 2 is not trivial, or at least, I didn't find a
- non-trivial way of doing it.
-
- |> > One further question occurs: when may reallocation (and the resulting
- |> > invalidation of the pointers) occur when reserve has not been called?
- |> > For example, is the following guaranteed to work as expected:
- |> >
- |> > string s ;
- |> > size_t i , j ;
- |> > assert( i < s.size() , j < s.size() ) ;
- |> > s[ i ] = s[ j ] ;
-
- |> > I would like for the last statement to be well defined, but according to
- |> > my reading of the standard, it isn't. Reallocation is allowed in the
- |> > non-const version of operator[] (and must be, if copy on write is to be
- |> > a legal implementation).
-
- |> "must be" doesn't follow -- indexing _could_ return
- |> a proxy object rather than a reference, couldn't it?
-
- Not according to the September draft. (IMHO, this is just a careless
- error. The return type should be implementation defined.)
-
- |> > This operator is called twice in the
- |> > expression, and the compiler could very easily (and legally) generate
- |> > both calls before using either of the results. But if the second call
- |> > reallocates, the reference returned by the first call is invalidated.
- |> > (I cannot actually conceive of an implementation in which the second
- |> > call reallocates, but I cannot find anything in the draft to guarantee
- |> > that it won't.)
-
- |> A naive COW implementation in which the "first" evaluated index operator
- |> is the RHS above and for which a const alias is used will
- |> invoke the CONST indexing operator, while evaluating
- |> the LHS invokes the NONCONST indexing operator and triggers
- |> reallocation. For example:
-
- |> string s = "Hello";
- |> string const &cs = s;
- |> s[0] = cs[0];
-
- |> is almost certain to fail in a naive COW implementation
- |> if the RHS indexing happens to return a lvalue instead
- |> of an rvalue.
-
- Correct.
-
- |> Here's another case:
-
- |> s.insert(const_iterator, "x");
-
- |> [No, there's nothing wrong with inserting at a const iterator.
- |> It is the string object before the dot (.) that must be
- |> non-const to support insertion, NOT the iterator, which merely
- |> marks the place where the insertion is to occur.]
- |> But calling the "insert" method might trigger a reallocation.
-
- |> I would not be surprised at an implemention
- |> which split the representation when ANY iterator
- |> was got from the string -- const or not. The same
- |> fix can be applied to indexing (requiring a mutable
- |> pointer inside the string object).
-
- But it doesn't help. Consider the following:
-
- string s1 , s2 ;
- char* pc( &s1[ i ] ) ;
- // At this point, s1 representation
- // is unique.
- s2 = s1 ; // And now it is not.
- assert( pc == &s1[ i ] ) ;
-
- As far as I can tell, most COW will probably fail the above assertion
- because of the potential write through the results of the non-const
- operator[] in the assert. I don't think a proxy will help; if the
- above is to be defined in a logical manner, the proxy will have to
- overload operator&, and the implementation of this will inevitably
- trigger a split of the two representations.
-
- Again, as far as I can tell, an implementation in which this will fail
- is *NOT* forbidden by the draft. Add a call to s1.reserve() before
- the first address, however, and I believe that it must work (or
- reserve has no real semantics, and is just a hint). Let's face it,
- I've not modified the s1 at all.
-
- |> Is there a rule for writing COW that makes it work???
-
- |> Yes. IF any method returns something that
- |> binds directly to the representation, it must
- |> be split BEFORE the binding is done.
-
- It doesn't help. You also have to remember that you don't have the
- right to share the representation in a later assignment.
-
- |> For example, if the indexing operator uses
- |> a suitable proxy, there's no problem.
- |> Similarly, if the iterators are pairs:
-
- |> (string*, index)
-
- |> there's no problem.
-
- As long as you don't actually modify the string. But consider the
- following:
-
- string s1 , s2 ;
- s1 = "123" ;
- char const* pc( &s1[ 1 ] ) ;
- s2 = s1 ; // Share the image
- s1[ 0 ] = 'a' ; // And the value in pc?
-
- In my own string (before the ISO class), I simply didn't support
- non-const indexing for the longest time. And in practice, I never
- found that to be a restriction. (Strings are not containers, and the
- semantic value of the individual characters depends on the context of
- the string.) When I finally added non-const indexing (user demand), I
- used a proxy class much like what you suggest, and I didn't support
- taking the address.
-
- In my implementation of the ISO basic_string class, I added a
- guarantee that no non-const function will invalidate pointers. My
- implementation also maintains the guarantee concerning pointers after
- a reserve; *no* operation on that string will invalidate pointers as
- long as the resulting string will fit into the capacity(). This
- latter guarantee required some fairly complex handling, however, and
- has a definite negative impact on run-time.
-
- |> My current string class MTLstring does the latter AND is a naive
- |> (non-COW) implementation AND allows accesses
- |> "off the end" -- it is highly robust, and VERY hard to break
- |> without doing "obvious" things (like deleting the string).
- |> It permits you to insert or erase in the string
- |> and will NOT invalidate any iterators. In fact,
- |> my iterators CANNOT be invalidated except by deleting
- |> the string. They may, of course, magically point to
- |> a new character after an insertion. [In fact,
- |> my iterators can be anchored at EITHER end of the
- |> string, so a "end()-1" iterator is NOT moved
- |> by an insertion at end()-2, because it's anchored
- |> to the RHS end of the string, not the LHS end.]
-
- I did it even simpler. My string class initially had *no* non-const
- functions other than assign, and no way of getting an internal address
- of the string. Very quickly, I found that I needed the equivalent of
- c_str(), but for a long time, that was the only compromise.
-
- |> The price is inefficiency: read/write can
- |> be made very efficient by grabbing a pointer
- |> to the raw array, but duplicate copying
- |> make the class unsuitable for som applications.
-
- |> But I'm just not interested in chasing the kind of
- |> horrible problems that fragile implementations have, in a class
- |> I personally conceive as a formatting
- |> and message passing utility.
-
- I agree here. My string class was never written with performance in
- mind, simply because my programs are not string oriented. The one
- exception is a program to format comments; all of the formatting is
- done with my string class. But how many lines will a comment have,
- anyway.
-
- |> A reasonable compromise for a COW implementation
- |> to make users of the indexing operators
- |> and STL iterators pay for using non-OO syntax:
-
- |> s.put(index, chr)
-
- |> does not suffer from these problems, for example.
-
- See above. If you allow external pointers, and you guarantee that the
- above will not invalidate them (in general, or because of an explicit
- request from the user, like reserve), then you have a problem.
- --
- James Kanze Tel.: (+33) 88 14 49 00 email: kanze@gabi-soft.fr
- GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
- Conseils, Θtudes et rΘalisations en logiciel orientΘ objet --
- -- A la recherche d'une activitΘ dans une region francophone
-
-
- [ comp.std.c++ is moderated. Submission address: std-c++@ncar.ucar.edu.
- Contact address: std-c++-request@ncar.ucar.edu. The moderation policy
- is summarized in http://dogbert.lbl.gov/~matt/std-c++/policy.html. ]
-
-